Have you ever thought what is the end-goal / final deliverable that you want to achieve at your workplace? Well I did, because I was very open to my boss that I will eventually move back to my home in Malaysia, after another 2-3 years. Hence, I thought about what will be my “graduating project” here in ShopBack.
Due to my interest in understanding how Redshift works under the hood, I became the DA that will be pinged whenever we had any technical issue with Redshift or ETL pipeline. After spending some time dealing with ETL problems, I thought to myself:
It would be nice if we can commit to an SLA, committing to a promise to our stakeholders that looks something like: “Hey, we promise that table X Y and Z will be ready everyday at 8am SGT, updated with data up to 12am D-1”
I spoke to my boss about this, and he was supportive of it. The rough high-level plan we came up was:
- Kill any models exposed on Metabase that are no longer used.
- Consolidate active DAGs (we have too many active DAGs, it’s difficult to tell the lineage and which are the important ones)
- Create a “gantt chart” to identify which DAGs are scheduled at what time of the day
The goal is to maximize the number of models running all day long, while minimizing number of models in queue. Our empirical knowledge suggests that only around 5 dbt models can run at one time, given our current Redshift size.